Compiling with Continuations, Continued 



Andrew Kennedy 

Microsoft Research Cambridge 
akenn@microsoft.com 



Abstract 

We present a series of CPS-based intermediate languages suitable 
for functional language compilation, arguing that they have practi- 
cal benefits over direct-style languages based on A-normal form 
(ANF) or monads. Mining of functions demonstrates the bene- 
fits most clearly: in ANF-based languages, inlining involves a re- 
normalization step that rearranges let expressions and possibly in- 
troduces a new 'join point' function, and in monadic languages, 
commuting conversions must be applied; in contrast, inlining in our 
CPS language is a simple substitution of variables for variables. 

We present a contification transformation implemented by sim- 
ple rewrites on the intermediate language. Exceptions are modelled 
using so-called 'double-barrelled' CPS. Subtyping on exception 
constructors then gives a very straightforward effect analysis for ex- 
ceptions. We also show how a graph-based representation of CPS 
terms can be implemented extremely efficiently, with linear-time 
term simplification. 

Categories and Subject Descriptors D.3.4 [Programming Lan- 
guages]: Processors - Compilers 

General Terms Languages 

Keywords Continuations, continuation passing style, monads, op- 
timizing compilation, functional programming languages 

1. Introduction 

Compiling with continuations is out of fashion. So report the au- 
thors of two classic papers on Continuation-Passing Style in recent 
retrospectives: 

"In 2002, then, CPS would appear to be a lesson aban- 
doned." (McKinley 2004; Shivers 1988) 

"Yet, compiler writers abandoned CPS over the ten years 
following our paper anyway." (McKinley 2004; Flanagan 
et al. 1993) 

This paper argues for a reprieve for CPS: "Compiler writers, give 
continuations a second chance." 

This conclusion is borne of practical experience. In the MLj 
and SML.NET whole-program compilers for Standard ML, co- 
implemented by the current author, we adopted a direct-style, 
monadic intermediate language (Benton et al. 1998, 2004b). In 
part, we were interested in effect-based program transformations, 
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so monads were a natural choice for separating computations from 
values in both terms and types. But, given the history of CPS, prob- 
ably there was also a feeling that "CPS is for call/cc", something 
that is not a feature of Standard ML. 

Recently, the author has re-implemented all stages of the 
SML.NET compiler pipeline to use a CPS-based intermediate lan- 
guage. Such a change was not undertaken lightly, amounting to 
roughly 25,000 lines of replaced or new code. There are many 
benefits: the language is smaller and more uniform, simplifica- 
tion of terms is more straightforward and extremely efficient, and 
advanced optimizations such as contification are more easily ex- 
pressed. We use CPS only because it is a good place to do opti- 
mization; we are not interested in first-class control in the source 
language (call/cc), or as a means of implementing other features 
such as concurrency. Indeed, as SML.NET targets .NET IL, a call- 
stack-based intermediate language with support for structured ex- 
ception handling, the compilation process can be summarized as 
"transform direct style (SML) into CPS; optimize CPS; transform 
CPS back to direct style (.NET IL)". 

1.1 Some history 

CPS. What's special about CPS? As Appel (1992, p2) put it, 
"Continuation-passing style is a program notation that makes ev- 
ery aspect of control flow and data flow explicit". An important 
consequence is that full /3-reduction (function inlining) is sound. 
In contrast, for call-by-value languages based on the lambda cal- 
culus, only the weaker /3-value rule is sound. For example, 13- 
reduction cannot be applied to (A:r.0) (/ y) because / y may 
have a side-effect or fail to terminate; but its CPS transform, 
/ y (\z.(\x.Xk.k 0) z k) can be reduced without prejudice. 
There are obvious drawbacks: the complexity of CPS terms; the 
need to eliminate administrative redexes introduced by the CPS 
transformation; and the cost of allocating closures for lambdas in- 
troduced by the CPS transformation, unless some static anlysis is 
first applied. In fact, these drawbacks are more apparent than real: 
the complexity of CPS terms is really a benefit, assigning use- 
ful names to all intermediate computations and control points; the 
CPS transformation can be combined with administrative reduc- 
tion; and by employing a syntactic separation of continuation- and 
source-lambdas it is possible to generate good code directly from 
CPS terms. 

ANF. In their influential paper "The Essence of Compiling with 
Continuations", Flanagan et al. (1993) observed that "fully devel- 
oped CPS compilers do not need to employ the CPS transformation 
but can achieve the same results with a simple source-level transfor- 
mation". They proposed a direct-style intermediate language based 
on yl-normal forms, in which a let construct assigns names to every 
intermediate computation. For example, the term above is repre- 
sented as let z = / y in (Xx.0) z, to which /3-reduction can be ap- 
plied, obtaining the semantically equivalent let z = f y in 0. This 
style of language has become commonplace, not only in compilers, 



but also to simplify the study of semantics for impure functional 
languages (Pitts 2005, §7.4). 

Monads. Very similar to ANF are so-called monadic languages 
based on Moggi's computational lambda calculus (Moggi 1991). 
Monads also make sequencing of computations explicit through a 
let x <= M in N binding construct, the main difference from ANF 
being that let constructs can themselves be let -bound. The sepa- 
ration of computations from values also provides a place to hang 
effect annotations (Wadler and Thiemann 1998) which compilers 
can use to perform effect-based optimizing transformations (Ben- 
ton et al. 1998). 

1.2 The problem 

A-Normal Form is put forward as a compiler intermediate language 
with all the benefits of CPS (Flanagan et al. 1993, §6). Unfor- 
tunately, the normal form is not preserved under useful compiler 
transformations such as function inlining (/3-reduction). Consider 
the ANF term 

M = let x — (Ay. let z — a b in c) d in e. 

Now naive /3-reduction produces 

let x — (let z = a b in c) in e 

which is not in normal form. The 'fix' is to define a more complex 
notion of /3-reduction that re-normalizes let constructs (Sabry and 
Wadler 1997), in this case producing the normal form 

let 2 = a b in (let x = c in e). 

In contrast, the CPS transform of M, namely 

(Xy.\k.ab(\z.k c)) d (Xx.k e), 

simplifies by simple /3-reduction to 

a b (\z.(\x.k e) c). 

As Sabry and Wadler explain in their study of the relationship be- 
tween CPS and monadic languages, "the CPS language achieves 
this normalization using the metaoperation of substitution which 
traverses the CPS term to locate k and replace it by the contin- 
uation thus effectively 'pushing' the continuation deep inside the 
term" (Sabry and Wadler 1997, § 8). 

Monadic languages permit let expressions to be nested, but 
incorporate so-called commuting conversions (cc's) such as 

let y <= (let x <= M in N) in P 
-> let x <= M in (let y <= N in P). 

ANF can be seen as a monadic language in which /3-reduction is 
combined with cc-normalization ensuring that terms remain in cc- 
normal form. 

All of the above seems quite benign; except for two things: 

1. Commuting conversions increase the complexity of simplifying 
intermediate language terms. Reductions that strictly decrease 
the size of the term can be applied exhaustively on CPS terms, 
the number of reductions applied being linear in the size of the 
term. The equivalent ANF or monadic reductions must neces- 
sarily involve commuting conversions, which leads to 0(n 2 ) 
reductions in the worst case. Moreover, as Appel and Jim (1997) 
have shown, given a suitable term representation, shrinking re- 
ductions on CPS can be applied in time 0(n) ; it is far from clear 
how to amortize the cost of commuting conversions to obtain a 
similar measure for ANF or monadic simplification. 

2. Real programming languages include conditional expressions, 
or, more generally, case analysis on datatype constructors. 
These add considerable complexity to reductions on ANF or 



monadic terms. Consider the term 

let z = (\x.\f x then a else b) c in M 

This is in ANF, but /3-reduction produces 

let z = (if c then a else 6) in M, 

which is not in normal form because it contains a let -bound 
conditional expression. To reduce it to normal form, one must 
either apply a standard commuting conversion that duplicates 
the term M, producing 

if c then let z — a in M else let z = b in M, 

or introduce a 'join-point' function for term M, to give 

let k z = M 

in if c then let z — a in k z else let z — b in k z. 

Observe that k is simply a continuation! In our CPS language, 
k is already available in the original term, being the (named) 
continuation that is passed to the function to be inlined. The de- 
sire to share subterms almost forces some kind of continuation 
construct into the language. Better to start off with a language 
that makes continuations explicit. 

1.3 Contribution 

Much of the above has been said before by others, though not al- 
ways in the context of compilation; in this author's opinion, the 
most illuminating works are Appel (1992); Danvy and Filinski 
(1992); Hatcliff and Danvy (1994); Sabry and Wadler (1997). One 
contribution of this paper, then, is to draw together these observa- 
tions in a form accessible to implementers of functional languages. 

As is often the case, the devil is in the details, and so another 
purpose of this paper is to advocate a certain style of CPS that 
works very smoothly for compilation. Continuations are named and 
mandatory (just as every intermediate value is named, so is every 
control point), are second-class (they're not general lambdas), can 
represent basic blocks and loops, can be shared (typically, through 
common continuations of branches), represent exceptional control 
flow (using double-barrelled CPS), and are typeable (but can be 
used in untyped form too). By refining the types of exception 
values in the double-barrelled variant we get an effect system for 
exceptions 'for free'. 

We make two additional contributions. Following Appel and 
Jim (1997), we describe a graph-based representation of CPS terms 
that supports the application of shrinking /3-reductions in time lin- 
ear in the size of the term. We improve on Appel and Jim's selec- 
tive use of back pointers for accessing variable binders, and em- 
ploy the union-find data structure to give amortized near-constant- 
time access to binders for all variable occurrences. This leads to ef- 
ficient implementation of ^-reductions and other transformations. 
We present benchmark results comparing our graph-CPS represen- 
tation with (a) an earlier graphical representation of the original 
monadic language used in our compiler, and (b) the original func- 
tional representation of that language. 

Lastly, we show how to transform functions into local continu- 
ations using simple term rewriting rules. This approach to contif- 
ication avoids the need for a global dominator analysis (Fluet and 
Weeks 2001), and furthermore supports nested and first-class func- 
tions. 

2. Untyped CPS 

We start by defining an untyped continuation-passing language 
Acps that supports non-recursive functions, the unit value, pairs, 
and tagged values. Even for such a simple language, we can cover 
many of the issues and demonstrate advantages over alternative, 
direct-style languages. 



• The expression let x — ■Ki y in K projects the i'th component 
of a pair y and binds it to variable x in K. 

• The expression letcont k x — K in L introduces a local 
continuation k whose single argument is x and whose body 
is K, to be used in term L. It corresponds to a labelled block in 
traditional lower-level representations. In Section 3 we extend 
local continuations with support for recursion, and so represent 
loops directly. 

• A continuation application k x corresponds to a jump (if k is a 
local continuation) or a return (if k is the return continuation 
of a function value). As with values, continuations must be 
named: function application expressions and case constructs 
do not have subterms, but instead mention continuations by 
name. We need only ever substitute continuation variables for 
continuation variables. 

Local continuations can be applied more than once, as in 

letcont j y — K in 
letcont fci x\ = (letval x = Vi in j x) in 
letcont X2 — (letval x — Vi in j x) in 
case z of k\ \\ k^ 

Here j is the common continuation, or 'join point' for branches 
ki and k^. 

• The expression / k x is the application of a function / to an 
argument x and a continuation k whose parameter receives the 
result of applying the function. If k is the return continuation 
for the nearest enclosing A, then the application is a 'tail call'. 
For example, consider the function value 



Grammar 
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letval x = V in K 
let x = 7r; x in K 
letcont k x = K in L 
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Figure 1. Syntax and scoping rules for untyped language Agp S 



In Section 3, we add recursive functions, types, polymorphism, 
exceptions, and effect annotations. At that point, the language re- 
sembles a practical CPS-based intermediate language of the sort 
that could form the core of a compiler for SML, Caml, or Scheme. 

Figure 1 presents the syntax of the untyped language. Here or- 
dinary variables are ranged over by x, y, /, and g, and continuation 
variables are ranged over by k and j. Indices i range over 1,2. We 
specify scoping of variables using well-formedness rules for values 
and terms. Here F h V ok means that value V is well-formed in 
the scope of a list of ordinary variables T, and T; A h K ok means 
that term K is well-formed in the scope of a list of continuation 
variables A and a list of ordinary variables F. Complete programs 
are well-formed in the context of a distinguished top-level contin- 
uation halt. (For the typed variant of our language there will be 
typing rules with F and A generalized to typing contexts.) 

We describe the constructs of the language in turn. 

• The expression letval x — V in K binds a value V to a 
variable x in the term K. This is the only way a value V 
can be used in a term; arguments to functions, case scrutinees, 
components of pairs, and so on, must all be simple variables. 
Even the unit value () must be bound to a variable before being 
used (in the full language, the same holds even for constants 
such as 42). This means that there is no need for a general 
notion of substitution: we only substitute variables for variables. 
Notice also that there is no notion of redundant binding such as 
let x ■<= y in K. 



Xkx. (letcont j y = g k y in / j x). 

Here g is in tail position, and / is not. In effect, we are defining 
^.g(f(x)). 

• The construct case x of ki \\ k-z expects x to be bound to 
a tagged value in 4 y and then dispatches to the appropriate 
continuation ki, passing y as argument. 

• Values include the unit value (), pairs (x, y) and tagged values 
in; x. Function values Xkx.K include a return continuation k 
and argument x. Note carefully the well-formedness rule (abs): 
its continuation context includes only the return continuation k, 
thus enforcing locality of continuations introduced by letcont. 

The semantics is given by environment-style evaluation rules, 
presented in Figure 2. As is conventional, we define a syntax of 
run-time values, ranged over by r, supporting the unit value, pairs, 
constructor applications, and closures. Environments map variables 
to run-time values, and continuation variables to continuation val- 
ues. Continuation values are represented in a closure form, which 
gives the impression that they are first-class. An alternative would 
be to model stack frames more directly and thereby demonstrate 
that continuations are in fact just code pointers. For the purpose 
of simply defining the meaning of programs we prefer the closure- 
based semantics. 

The function [[■] p interprets a value expression in an environ- 
ment p. Terms are evaluated in an environment p; the only obser- 
vations that we can make of programs are termination, i.e. the ap- 
plication of the top-level continuation halt to a unit value. 

2.1 CPS transformation 

To illustrate how the CPS-based language can be used for func- 
tional language compilation, consider a fragment of Standard ML 



Runtime values: r ::= () | (r 1 ,r 2 ) \ in, r \ (p,\kx.K) 
Continuation values: c ::= (p,\x.K) 
Environments: p ::= • | p, x t— ► r | p, k t— ► c 

Interpretation of values: 

101 p = 0 [(*,»)] p = (p(x), P (y)) 

ImiVJp = ini(p(a;)) [A/cx.-fsTJ p = (p,\kx.K) 
Evaluation Rules: 

p,x^{VjphK^ 



(e-let) ■ 



(e-letc) 



p h letval x = V in K JJ 

p, fc i — ► (p, \x.K) h L JJ. 
p h letcont fc x = if in L JJ. 



, •■■ P. y <-> r-j I- if JJ. 

(e-proj) — — : — — — p(x) = (ri,r 2 ) 

p h let j/ = 7Ti a; in A JJ 

. /,!/>-» p(x) h K JJ , 
(e-appc) g - ^ ' x ^ p(k) = (p , Xy.K) 

(cca3c) p',y^rhKll p(x) = \mr 

p h case x of fci [] fc 2 JJ p(fe) = (p', At/.-ff) 

, P',3 ^ P(k),y i-> p(x) h Jf JJ . . . , . 
(e-app) pTTfeT^ = <P ' XJ V - K) 

(e-halt) ■ 



p h halt x JJ 



Figure 2. Evaluation rules for A; 



Figure 3. Naive CPS transformation of toy ML into Agps 

whose expressions (ranged over by e) have the following syntax: 

ML Be ::= x | e e' J f n x => e | (e,e') | #i e | 0 
| ini e | let val i = e in e' end 
| case e of inl x\ => ei I in2 2:2 => e2 

We assume a datatype declared by 

datatype Oa.'b) sum = inl of 'a I in2 of 'b 

Expressions in this language can be translated into untyped CPS 
terms using the function shown in Figure 3. This is an adapta- 



tion of the standard higher-order one-pass call-by-value transfor- 
mation (Danvy and Filinski 1992). An alternative, first-order, trans- 
formation is described by Danvy and Nielsen (2003). 

The transformation works by taking a translation-time func- 
tion k as argument, representing the 'context' into which the trans- 
lation of the source term is embedded. For our language, the con- 
text's argument is a variable, as all intermediate results are named. 
Note some conventions used in Figure 3: translation-time lambda 
abstraction is written using \ and translation-time application is 
written k(. . .), to distinguish from A and juxtaposition used to de- 
note lambda abstraction and application in the target language. Also 
note that any object variables present in the target terms but not in 
the source are assumed fresh with respect to all other bound vari- 
ables. 

The translation is one-pass in the sense that it introduces no 
'administrative reductions' (here, /3-redexes for continuations) that 
must be removed in a separate phase, except for let constructs (to 
avoid these also would require analysis of the let expression; we 
prefer to apply simplifying rewrites on the output of the transfor- 
mation). However, the translation is naive in two ways. First, it in- 
troduces jy-redexes for continuations when translating tail function 
applications. For example, [fn x => f (x ,y)\ k produces 

letval g = Xk a;. (letval p = (x, y) in letcont j z = k z in / j p) 
in K{g) 

whose ?)-redex (highlighted) can be eliminated to obtain the more 
compact 

letval g = (Afc x. letval p — (x,y) in / k p) in n(g). 

Second, the translation of case duplicates the context; consider, 
for example, /(case x of inl xi => ei I in2 X2 => e 2 ) whose 
translation involves two calls to /. 

The more sophisticated translation scheme of Figure 4 avoids 
both these problems; again, this is based on Danvy and Filinski 
(1992). The translation function [•] is as before, except (a) it in- 
troduces a join point continuation to avoid context duplication for 
case, and (b) for terms in tail position it uses an alternative trans- 
lation function (j-|) that takes an explicit continuation variable as 
argument instead of a context. 

2.2 Rewrites 

After translating from source language to intermediate language, 
most functional language compilers perform a number of optimiza- 
tion phases that are implemented as transformations on intermedi- 
ate language terms. Some phases are specific (for example, arity- 
raising of functions, or hoisting expressions out of loops) but usu- 
ally there is some set of general rewrites based on standard re- 
ductions in the lambda-calculus. Figure 5 presents some general 
rewrites for our CPS-based language. The rewrites look more com- 
plicated than the equivalent reductions in the lambda-calculus be- 
cause the naming of intermediate values forces introduction and 
elimination forms apart. For example, /3-reduction on pairs, which 
in the lambda calculus is simply Hi (ei, e 2 ) — > d, has to support 
an intervening context C. In practice, the rewrites are not hard to im- 
plement. In functional style, value bindings (e.g. pairs) are stored in 
an environment which is accessed at the reduction site (e.g. a pro- 
jection). In imperative style, bindings are accessed directly through 
pointers, as we shall see in Section 4.1. 

The payoff from this style of rewrite is the selective use of j3 
rules. For example, in a lambda-calculus extended with a let con- 
struct, one might perform the reduction let p — (x,y) in M — > 
M[(x,y)/p] but this would be undesirable unless every substi- 
tution of (x, y) for p in M produced a redex. In our language, 
letval p = (x,y) in . .. k p ... let z — 7Ti p in K reduces to 



[•] : ML -> (Var -> CTm) -> CTm 

[()] k = letval x = () in k(x) 
[ei e 2 ] k = [ei] (\ Zl . 

[e 2 ] (\z 2 . 

letcont k x = /t(x) in z\ k z 2 )) 
[(ei,e 2 )l]K = [eij (\zi. 

[e 2 ] (\z 2 . 

letval x — (zi,z 2 ) in k(x))) 
l±ni e] k = [e] (\z. letval x — in; z in k(x)) 
l#i e] k = [e] (^z.let x — m z in /t(x)) 
[fn x => e] k = letval / = Afex.Je] (Xz.k z) in 
[let val x = ei in e 2 end] k = 

letcont j x — [c 2 ] k in [ei] (\z.j z) 

[case e of inl xi => e\ I in2 x 2 => e 2 ] k = 
[e] (\z. letcont ki x\ = [ei] k in 
letcont k 2 x 2 — [e 2 ] k in 
case z of k\ \\k 2 ) 



[•] : ML -> (Var CTm) CTm 
[f n x => e] /t = letval / = Afcx. fle[) fc in 
[let val x = ei in ei end] k = letcont j x = \e%\ k in (|eiD j 
[case e of inl Xi => ei I in2 £2 => 62] K 
= [e] (\z. letcont j x — n(x) in letcont ki xi = (|ei D j in letcont ki X2 = fle ; 



in case 2 of fci [] fe) 



(|ei e 2 |) fc 
(|f n a; => e|) fc 
d(ei,e 2 )^ fc 
dini e[) fc 

d#i e|) fc 

diet val x = e\ in e2 end|) fc 
dcase e of inl X\ => ei I in2 2:2 => e2|) fc 



ML -> CVar -> CTm 
k x 

[ei] (^xi.[e 2 ] (kc 2 .xi fc x 2 )) 

letval / = Xjx.<\e\) j in fc / 

[ei] (&xi.[e2] (^2-letval x = (xi, X2) in fc x)) 

|e] (Iz. letval x — \r\i z in k x) 

letval x — () in fc x 

[e] (^z.let x <= -Ki z in k x) 

letcont j x = de2|) fc in (]eiP j 

[e] (^2. letcont ki x\ = (leil) k ' n letcont fc 2 x 2 = 



de2|) k in case z of fci [] fe) 



Figure 4. Tail CPS transformation (changes and additions only shown) 



C ::= [] I letval x = 1/ in C | let x = tv z y in C \ 

letval x — \k x.C in if | letcont k x = C in if \ 
letcont k x = if in C 

DEAD-CONT letcont fc x = L in if — > L (fc not free in if) 
DEAD- VAL letval x = V in if — ♦ if (x not free in if) 

/3-CONT letcont fc x = if in C[fc 3/] 

— > letcont k x = if in C[if [y/x]] 
/3-Fun letval / = Xkx.K in C[/j y] 

-> letval / = Xkx.K in C[K[y/x,j/k]] 
/3-CASE letval x = in^ y in C[case x of fci [] £2] 

— ► letval x — \r\ty in C[fc; y] 
/3-PAIR letval x = (xi,X2) in C[let y = 7r; x in if] 

— > letval x — (xi,X2) in C[K[xi/y]] 

/3-Cont-Lin letcont fc x = if in C[fc y] 

C[if [y/x]] (if fc not free in C) 
/3-Fun-Lin letval / = Xkx.K in C[/ j y] 

-> C[if [y/x, j/fc]] (/ ^ y, / not free in C) 



77-C0NT 

77-FUN 

77-PAIR 



77-CASE 



letcont k x = j x in if — > K[j/k] 
letval / = Xkx.g k x in if — > K[g/ f] 
let Xi — -KiX in C[let Xj = 7Tj x 

in C'[letval y = (xi,X2) in if]] 
— > let Xi = 7Ti x in C[let Xj = 7Tj x 

in C'[K [x/y]]] ({i,j} = {l,2}) 



in; xi 
= in, x 2 



letcont fci xi = (letval yi = 
C [letcont kj X2 = (letval y2 
C'[case x of fci | /C2]] 
— > letcont ki xi = (letval yi = ini xi 
C [letcont kj X2 = (letval y2 = 
C'[kx]] ({«,J} = {1,2» 



in k yi) in 
in k yi) in 



: = in. 



, in k yi) in 
X2 in k y-i) in 



Figure 5. General rewrites for Acps 



letval p = (x, y) in . . . k p . . . if [x/z] which applies the /3-PAIR 
rule to 7Ti p but preserves other occurrences of p. 

It is easy to show that all rewrites preserve well-formedness of 
terms. In particular, the scoping of local continuations is respected. 

The /3-FUN and /3-CONT reductions are Mining transforma- 
tions for functions and continuations. The remainder of the reduc- 
tions we call shrinking reductions, as they strictly decrease the size 
of terms (Appel and Jim 1997). The /3-CONT-LlN and /3-FUN-LlN 
reductions are special cases of /3-reduction for linear uses of a vari- 
able, in effect combining DEAD- and j3- reductions. Shrinking re- 
ductions can be applied exhaustively on a term, and are typically 
used to 'clean up' a term after some special-purpose global trans- 
formation such as arity-raising or monomorphisation. Clearly the 
number of such reductions will be linear in the size of the term; 
moreover, using the representation of terms described in Section 4 
it is possible to perform such reductions in linear time. 

2.3 Comparison with a monadic language 

The original implementations of the MLj and SML.NET compil- 
ers used monadic languages inspired by Moggi's computational 
lambda calculus (Moggi 1991). Figure 6 presents syntax for a 
monadic language A mo n and selected reduction rules. 

The defining feature of monadic languages is that sequencing 
of computations is made explicit through the let construct; val- 
ues are converted into trivial computations using the va I construct. 
Monadic languages share with CPS languages the property that fa- 
miliar /3-reduction on functions is sound, as evaluation of the func- 
tion argument is made explicit through let. But there are drawbacks, 
as we outlined in the Introduction. (An orthogonal issue - as for 
CPS based languages - is whether values can appear anywhere ex- 
cept inside val. In A m0 n, for ease of presentation, we permit values 
to be embedded in applications, pairs, and so on, whereas for A^ps 
we insist that they are named. The difference shows up in the re- 
duction rules, which in Acps make use of contexts. It should be 
noted that the drawbacks of monadic languages that we are about 
to discuss are unaffected by this choice.) 

Problem 1: need for let/let commuting conversion. The basic 
reductions listed in Figure 5 have corresponding reductions in CPS. 
The let construct itself has j3 and 77 rules which correspond to 
/3-CONT and r;-CONT for Ac PS (consider the CPS transforms of 
the terms). In contrast to CPS-based languages, though, monadic 



Grammar 

MTm B M,N ::= val v | let x <= M in N \ v w | m v 
| case v of ini x\.M\ \\ in 2 x 2 .M 2 
MValBv,w ::= a; | Acc.M | (v, w) | in, v | () 

Reductions 

/3-LET let x <4= val u in M — > M[w/x] 
?7-LET let x <4= M in val x — » M 
CC-LET let a,-2 <4= (let x-i -4= Mi in M 2 ) in JV 

— ► let X! <S= Mi in (let x 2 -4= M 2 in N) 
CC-CASE let x <4= (case v of ini xi.Mi \\ in 2 x 2 .M 2 ) in A 
let / -4= val Xx.N 

in case w of ini Xi .let x -4= Mi in / x 
\\ in 2 x 2 .let a; <= M 2 in / x 

/3-PAIR 7T; (vi,v 2 ) — > "Ui 

/3-Fun (Ax.M) v -> M[v/x] 

/3-CASE case in; u of ini x\.M\ \\ in 2 x 2 .M 2 — ► Mi[v/xi] 



Figure 6. Syntax and selected rewrites for monadic language A, 



languages include a so-called commuting conversion, expressing 
associativity for let: 

CC-LET let x 2 <= (let xi <= Mi in M 2 ) in N 

— > let xi <4= Mi in (let x 2 <4= M 2 in AT) 

This reduction plays a vital role in exposing further reductions. 
Consider the source expression 

#1 ((fn x => (g x,x)) y) 

Its translation into A mon is 

let Z2 <= (Ax. let zi <= g x in val (zi,x)) y in 7Ti z 2 . 

Now suppose that we apply /3-FUN, to get 

let z 2 <= (let z\ <= g y in val (zi,y)) in 7Ti z 2 . 

In order to make any further progress, we must use CC-LET to get 

let zi <4= g y in let z 2 -4= val (zi, y) in 7Ti z 2 . 

Now we can apply /3-LET and /3-PAIR to get let zi -4= # jy in zi 
which further reduces by rj-LET to g iy. 

Solution 1: Use CPS. Now take the original source expression 
and translate it into our CPS-based language, with k representing 
the enclosing continuation. 

Iet / = Aj'i x. 

(letcont j2 21 = (letval Z2 = (2:1,2;) in ji z 2 ) in p j 2 x) 
in letcont j'3 2:3 = (let 24 = 7Ti 23 in A; 2:4) 
in / js y 

Applying /3-FUN-LlN gives the following, with substitutions high- 
lighted: 

letcont j3 23 = (let 24 = 7Ti 23 in k 24) 

in letcont j 2 z\ — (letval 2 2 = (21, y ) in jz 2 2 ) in g 32 y 

and by /3-CONT-LlN on jz we get 

letcont j 2 zi — 

(letval 2 2 = (zi,y) in let 24 = 7Ti 2 2 in k 24) 
in g h y- 

Finally, use of /3-PAIR and DEAD- VAL produces letcont j 2 21 = 
k zi in g j 2 y which reduces by jy-CONT to g k y. All reductions 
were simple uses of f3 and r\ rules, without the need for the addi- 
tional 'administrative' reduction CC-LET. 



Problem 2: quadratic blowup. The CC-LET reduction seems in- 
nocent enough. But observe that it is not a shrinking reduction - so 
it's not immediately clear whether reduction will terminate. Fortu- 
nately, the combination of CC-LET and shrinking /3/77-reductions 
of Figure 6 does terminate (Lindley 2005), and moreover there is 
a formal correspondence between the reductions of the monadic 
language and CPS (Hatcliff and Danvy 1994). Unfortunately, the 
order in which conversions are applied is critical to the efficiency 
of simplification by reduction. Consider the following term in A mon : 

let /„ -4= val (Ax n .let y 

let /„_i <= val (Aa;„_i.let y n _i <= f n i„_i in g y n -i) in 

let fx <= val (Axi.let j/i <4= / 2 xi in g yi) in/i a 

If (linear) /3-FUN is applied to all functions in this term, followed 
by a sequence of CC-LET reductions, then no redexes remain 
after 0(n) reductions. If, however, the commuting conversions 
are interleaved with /3-FUN, then 0(n 2 ) reductions are required. 
(There are other examples where it is better to apply commuting 
conversions first.) Although this is a pathological example, the 
'simplifier' was a major bottleneck in the MLj and SML.NET 
compilers (Benton et al. 2004a), in part (we believe) because of 
the need to perform commuting conversions. 

Solution 2: Use CPS. It is interesting to note that monadic terms 
can be translated into CPS in linear-time; shrinking reductions can 
be applied exhaustively there in linear-time (see Section 4); and the 
term can be translated back into CPS in linear-time. Therefore the 
quadratic blowup we saw above is not fundamental, and there may 
be some means of amortizing the cost of commuting conversions 
so that exhaustive reductions can be peformed in linear time. Nev- 
ertheless, it is surely better to have the term in CPS from the start, 
and enjoy the benefit of linear-time simplification. 

Problem 3: need for let /case commuting conversion. Matters 
become more complicated with conditionals or case constructs. 
Consider the source expression 

p'((?((fn x => case x of ini x\ => (xi ,xz) I in2 X2 => g" x) y)) 

Its translation into A mon is 

let 2 -4= (Ax. case x of ini xi.val (xi, xz) | in 2 x 2 .p" x) y in 
let z' <4= g 2 in g' 2'. 

This reduces by /3-FUN to 

let 2 <= (case y of ini xi.val (xi, xz) | in 2 x 2 .p" y) in 
let 2' <4= g 2 in g' 2'. 

At this point, we want to 'float' the case expression out of the let. 
The proof-theoretic commuting conversion that expresses this 
rewrite is 

let x <4= (case v of ini x\.M\ \ in 2 x 2 .M 2 ) in N 

case v of ini Xi.(let x <4= Mi in N) [] in 2 x 2 .(let x <= M 2 in N) 

This can have the effect of exposing more redexes; unfortunately, 
it also duplicates N which is not so desirable. So instead, compil- 
ers typically adopt a variation of this commuting conversion that 
shares M between the branches, creating a so-called join point 
function: 

CC-CASE let x -4= (case v of im X1.M1 Q in 2 x 2 .M 2 ) in N 
-> let / <f= val Xx.N 

in case v of ini xi.let x <4= Mi in / x 
\\ in 2 x 2 .let x <4= M 2 in / x 



Applying this to our example produces the result 

let / <= val (XzAet z' <= g z in g z ) in 
case x of 

ini xi.(let z <= val (0:1,0:3) in / z) 
\ ini £2. (let z <= g" x in / z). 

As observed earlier, join points such as / are just continuations. 

Solution 3: Use CPS. Consider the CPS transformation of the 
original source expression, with k being the enclosing return con- 
tinuation. 

letcont j' z' — g k z in 
letcont j z = g j' z in 
letval / = Xj"x. 

(letcont fci xi — (letval z" = (xi,xs) in j" z") in 

letcont hi x% = g" j" x in 
case x of ki D fo) 
in / j y 

Applying /3-FUN-LlN immediately produces the following term, 
with substitutions highlighted: 

letcont j' z' = g' k z' in 
letcont j z = g j' z in 

letcont ki x\ = (letval 2" = (0:1,0:3) in j z") in 
letcont k2 X2 — g" j y in 
case y of k\ \\ 

There is no need to apply anything analogous to CC-CASE, or to 
introduce a join point: the original term already had one, namely j, 
which was substituted for the return continuation j" of the function. 

The absence of explicit join points in monadic languages is 
an annoyance in itself. By representing join points as ordinary 
functions, it is necessary to perform a separate static analysis to 
determine that such functions can be compiled efficiently as basic 
blocks. 

Explicitly named local continuations in CPS have the advantage 
that locality is immediate from the syntax, and preserved under 
transformation; furthermore traditional intra-procedural compiler 
optimizations (such as those performed on SSA representations) 
can be adapted to operate on functions in CPS form. 

2.4 Comparison with ANF 

Flanagan et al. (1993) propose an alternative to CPS which they call 
^4-Normal Form, or ANF for short. This is defined as the image 
of the composition of the CPS, administrative normalization and 
inverse CPS transformations. 



CS 



A(CS) 




/^-normalization 



The source language CS is Core Scheme (corresponding to our 
fragment of ML), and their CPS transformation composed with f3- 
normalization is equivalent to our one-pass transformation [•] of 
Figure 4. 

The language A(CS) corresponds precisely to CC-Let/CC- 
Case normal forms in A m0 n- We can express these normal forms 
by a grammar: 

ATm 3 A, B ::= R\\etx ^= Rm A 

case v of ini X1.A1 [] in2 x^-A^. 
ACmp 9 R ::= v w \ iTi v \ v 
AVal 9 v, w ::= x \ Xx.A \ (v, w) | in, v | () 



Instead of going via a CPS language, the transformation into ANF 
can be performed in one pass, as suggested by the dotted line A in 
the diagram above. 1 A similar transformation has been studied by 
Danvy (2003). 

As Flanagan et al. (1993) suggest, the "back end of an A-normal 
form compiler can employ the same code generation techniques 
that a CPS compiler uses". However, as we mentioned in the In- 
troduction, it is not so apparent whether ANF is ideally suited to 
optimization. After all, it is not even closed under the usual rule 
for j3 reduction {Xx.A) v — ► j4[w/o:]. As Sabry and Wadler 
(1997) later explained, it is necessary to combine substitution with 
re-normalization to get a sound rule for /3-reduction: essentially the 
repeated application of CC-LET. They do not consider conditionals 
or case constructs, but presumably to maintain terms in ANF in it 
is necessary to normalize with respect to CC-LET and CC-CASE 
following function inlining. 

It is clear, then, that ANF suffers all the same problems that af- 
fect monadic languages: the need for (non-shrinking) commuting 
conversions, quadratic blowup of 'linear' reductions, and the ab- 
sence of explicit join points. 

3. Typed CPS with exceptions 

We now add types and other features to the language of Section 2. 
In the untyped world, we can model recursion using a call-by- value 
fixed-point combinator. For a typed language, we must add ex- 
plicit support for recursive functions - which, in any case, is more 
practical. Moreover, we would like to express recursive continu- 
ations too, in order to represent loops. Finally, to support excep- 
tions, functions in the extended language take two continuations: 
an exception-handler continuation, and a return continuation. This 
is the so-called double-barrelled continuation-passing style (Thi- 
elecke2002). 

Figure 7 presents the syntax and typing rules for the extended 
language Aj PS . Types of values are ranged over by r, a and include 
unit, a type of exceptions, products, sums and functions. (To save 
space, we omit constructs for manipulating exception values.) Con- 
tinuation types have the form which is interpreted as 'continua- 
tions accepting values of type r'. Note that for simplicity of presen- 
tation we do not annotate terms with types; it is an easy exercise to 
add sufficient annotations to determine unique typing derivations. 
Typing judgments for values have the form r h V : r in which Y 
maps variables to value types. Judgments for terms have the form 
T; A h K ok in which the additional context A maps continua- 
tion variables to continuation types. Complete programs are typed 
in the context of a single top-level continuation halt accepting unit 
values. 

We consider each construct in turn. 

• The letval construct is as before, with the obvious typing rule 
and associated value typing rules. Likewise for projections. 

• The letcont construct is generalized to support mutually recur- 
sive continuations. These represent loops directly. Local con- 
tinuations are also used for exception handlers. 

• The letfun construct introduces a set of mutually recursive 
functions; each function takes a return continuation k, an excep- 
tion handler continuation h, and an argument 1. Asa language 
construct, there is nothing special about the handler continua- 
tion except that its type is fixed to be ^exn, and so a function 
type r — > a is constructed from the argument type r and the 
type -icr of the return continuation. What really distinguishes 



1 Though, curiously, the '^-normalization algorithm' in (Flanagan et al. 
1993, Fig. 9) does not actually normalize terms, as it leaves let-bound 
conditionals alone. 



Grammar 

(value types) r, a 

(values) CVal 3 V, W 

(terms) CTm 3 K,L 

(function def.) FunDcf 3 F 

(cont. def.) ContDef 3 C 



Variables 

x:rGT 
(Var) rhx:r 



= unit | exn | t x a \ r + a \ r — > <r 

= 0 I («>y) I 

= letval x = V \n K | let a; = 7Ti x in if 

| fc x | f k h x | case a; of fei [] fc 2 
= fkhx = K 
= kx = K 



(contvar) ■ 



fc:^r 6 A 
A h fc : -it 



letcont C in if I letfun F in if 



Well-typed terms 

(letc) P. A, fa^n 



, k n :^r n h ifi ok}i^i^„ r;A,fci: hLok 



(letrec) ■ 



T; A h letcont fci xi = ifi, . . . , k„ x„ = K n in L ok 

{F,Xi:Ti, ff.T! —KTl,..., fn-Tn — » a n j fcj :-i<Tj , ftj :~iexn I" if, Okjlsgisg^ 



r, /i:ri — > (7i, . . . , f„: AhLok 



F; A h letfun /i fci hi a?i = ifi, . . . , /„ fc„ h n x n = f 

F\- x :t Aht:n T .. rhi:r, T,y:ri; A h if ok . , „ 

: 77 — I * 6 l,i 



= if n in L ok 



in if ok 



_ rhF:r r, x:t; A h if ok / N T h x : r A h A; : -.t , .. T h a; : n x t 2 
(letv) r; A h letval a; = V in if ok (appc) r ; A h kx ok (pr0j) r ; A h let y <= x 

Vr x:t 1 +t 2 A h fci : A h fc 2 : ->r 2 V h / : r -> a Ahfc:^a Ahfe: ^exn rhi:r 

(C3Se) r ; A h case x of fci Q fc 2 ok (app) r ; A h / fc h x ok 



Well-typed values 

. F h a : r Thy:, 



(pair) 



rh(i,f/):TX(7 



r h a; • t 

(tag) „ r ■ — n i € 1, 2 (unit) » r- 

f h irii i : ri + T2 r h () : unit 



Figure 7. Syntax and typing rules for typed language Acps 



Well-typed programs 

(pr0g) {};halt:^unithif ok 



exceptions is (a) their role in the translation from source lan- 
guage into CPS, and (b) typical strategies for generating code. 

• Continuation application k x is as before. Now there are four 
possibilities for k: it may be a recursive or non-recursive occur- 
rence of a letcont-bound continuation, compiled as a jump, it 
may be the return continuation, or it may be a handler continu- 
ation, which is interpreted as raising an exception. 

• Function application / k h x includes a handler continua- 
tion argument h. If k is the return continuation for the near- 
est enclosing function, and h is its handler continuation, then 
the application is a tail call. If k is a local continuation and h 
is the handler continuation for the enclosing function, then 
the application is a non-tail call without an explicit excep- 
tion handler - so exceptions are propagated to the context. 
Otherwise, h is an explicit handler for exceptions raised by 
the function. (Other combinations are possible; for example in 
letfun / khx = C[g h hy] in if the function application is 
essentially raise (g y) in a tail position.) 

• Branching using case is as before. 
3.1 CPS transformation 

We can extend the fragment of ML described in Section 2.1 with 
exceptions and recursive functions: 

ML Be ::= . . . | raise e | ei handle x => e 2 
| let fun dine end 
MLDef 3d ::= f x = e 

The revised CPS transformation is shown in Figure 8 (see (Kim 
et al. 1998) for the selective use of a double-barrelled CPS trans- 
formation). Both [■] and fl-[) take an additional argument: a contin- 
uation h for the exception handler in scope. Then raise e is trans- 
lated as an application of h. For ei handle x => e 2 a local handler 



continuation h! is declared whose body is the translation of e 2 ; this 
is then used as the handler passed to the translation function for ei . 

3.2 Rewrites 

The rewrites of Figure 5 can be adapted easily to A^ps , and extended 
with transformations such as 'loop unrolling': 

/3-REC letfun /i fci h± xi = C[fi k h x] 



in if 



/ 2 k 2 h 2 x 2 = if 2 

fn kn Hn Xn ~ ifn 



— > letfun fiki hixi = C[Ki[k/ki,h/hi,x/xi\] 
/ 2 k 2 h 2 x 2 = if 2 

■ ■ ■ fn fcn hn %n — ifn 

in if 

/3-RecCont letcont fci x\ = C[h x] 

k 2 x 2 — if 2 

. . . k n Xn — ifn 

in if 

— > letcont fci x\ — C[Ki[x/xi]] 
fc 2 a; 2 = if 2 

. . . fcn X n — ifn 

in if 

There are no special rewrites for exception handling, e.g. corre- 
sponding to (raise M) handle x.N ~ * let x M in N. Stan- 
dard /3-reduction on functions and continuations gives us this for 
free. For example, the CPS transform of 

let fun / x = raise x in / y handle z => (z,z) end 



letfun f k' h' x = h' x 

in letcont j z = (letval z' = (z, z) in k z ) in / k j y 
which reduces by /3-FUN and /3-CONT to letval z' = (y,y) in k z' . 



[•] 




ML -> CVar -> (Var -> CTm) -> CTm 


[a] ft k 


= 


k(x) 


[d e 2 J ft k 


= 


[ei] ft (^Xi.[e 2 ] ft (\x 2 - letcont fc x = k(x) in xi fc ft x 2 )) 


[f n x => ej ft k 


= 


letfun fkh'x = (|e[) ft' fc in «(/) 


[(ei,e 2 )] ft ft 


= 


[ei] ft(kci.[e 2 ] ft (kc 2 . letval x = (xi,x 2 ) in k(x))) 


[ini ej ft k 


= 


[e] ft (^2. letval x — tni z in ft(x)) 


10] hK 


= 


letval x = () in k(x) 


ejj ai k 




[e] ft (^2. let x ^= 7ii z \v\ ^(x)) 


[let val x = ei in e 2 end] ft k 


_ 


letcont j x — [e 2 ] ft-K in dei|) ft j 


[let fun d in e end] ft k 


= 


letfun [d] in [e] hn 


[raise e] ft k 




[e] ft (Iz.ft 2) 


[ei handle a; => e 2 ] ft /t 




letcont j x — k(x) in letcont ft' x = de 2 P ftj in deiD ft' J 


[case e of inl £1 => ei 1 in2 x 2 => e 2 ] k 






= [e] ft (^2. letcont j x — k(x) letcont ki 


Xl = 


dei|) hj in letcont fc 2 x 2 = de 2 |) ftj in case 2 of fci [] fc 2 ) 


H 




MLDef -> FunDef 


[/ x = e] 


— 


/fcftx = de|) ftfc 


d-D 




ML -> CVar -f CVar -»■ CTm 


dar|) ftfc 


= 


fc X 


(|ei e 2 |) ftfc 


— 


[ei] ft (^xi.[e 2 ] ft (kc 2 .xi fc ft x 2 )) 


df n x => e|) ft fc 


= 


letval / = Ajx.de|) hj in fc / 


d(ei,e 2 )P ftfc 


= 


[eij ft (^xi.[e 2 ] ft (kc 2 . letval x = (xi, x 2 ) in fc x)) 


dini e[) ft fc 


= 


[e] ft (^2. letval x = in^ 2 in fc x) 






letval x = () in fc x 


d#« e|) ft fc 




[e] ft (^2.let x <= 7Ti 2 in fc x) 


diet val x = ei in e 2 end[) ft fc 




letcont j x = fle 2 p ftfc in (|eiD ftj 


diet fun dine end[) ft fc 




letfun [d] in de|) ftfc 


draise eP ft fc 




[e] ft (Xs.A 2) 


dei handle x => e 2 |) ft fc 




letcont ft' x = de 2 |) ft fc in deiD ft' fc 


flcase e of inl x\ => ei 1 in2 x 2 => e 2 |) ft fc 






= [e] ft (\z. letcont fci xi = d e iD ftfc in letcont fc 2 x 2 = de 2 [) ftfc in case z of fci [] fc 2 ) 



Figure 8. Tail CPS transformation for Ace 



Likewise, commuting conversions are not required, in contrast 
with monadic languages, where in order to define well-behaved 
conversions it is necessary to generalize the usual M handle x =>■ 
N construct to try y <= M in N\ unless x => iV 2 , incorporating a 
success 'continuation' Ni (Benton and Kennedy 2001). 

3.3 Other features 

It is straightforward to extend Ac PS with other features useful for 
compiling full-scale programming languages such as Standard ML. 

• Recursive types of the form fia.r can be supported by adding 
suitable introduction and elimination constructs: a value fold x 
and a term let x = unfold y \r\K. 

• Binary products and sums generalize to the n-ary case. For opti- 
mizing representations it is common for intermediate languages 
to support functions with multiple arguments and results, and 
constructors taking multiple arguments. This is easy: function 
definitions have the form / fc ft x = K, and continuations have 
the form fc x = K and are used for passing multiple results 
and for case branches where the constructor takes multiple ar- 
guments. 

• Polymorphic types of the form Va.r can be added. Typing con- 
texts are extended with a set of type variables V. Then to sup- 
port ML-style let-polymorphism, each value binding construct 
(letval, letfun, and projection) must incorporate polymorphic 



generalization. For example: 

V,q;ri-V:r V; T, x:Va.r; A h K ok 
' V; V; A h letval x = V in K ok 

For elimination, we simply adapt the variable rule (var) to 
incorporate polymorphic specialization: 

xVq.t e r 

(Var) rhi:r[?/a] 
3.4 Effect analysis and transformation 

The use of continuations in an explicit 'handler-passing style' lends 
itself very nicely to an effect analysis for exceptions. Suppose, for 
simplicity, that there are a finite number of exception constructors 
ranged over by E. We make the following changes to A^ps: 

• We introduce exception set types of the form {Ei, . . . ,E n }, 
representing exception values built with any of the construc- 
tors Ei, . . . , E n . Set inclusion induces a subtype ordering on 
exception types, with top type exn representing any exception, 
and bottom type {} representing no exception. 

• The type of handler continuations in function definitions are 
refined to describe the exceptions that the function is permitted 
to throw. For example: 

(1) letfun /fc(ft:^{})x = ^ in ... 

(2) letfun / fc (ft:-iexn) x = K in . . . 

(3) letfun /fc(ft:^{£;,_E'})x = if in ... 



The type of (1) tells us that K never raises an exception, in 
(2) the function can raise any exception, and in (3) the function 
might raise E or E' . 

• Now that handlers are annotated with more precise types, the 
function types must reflect this too. We write r^ CT a for the 
type of functions that either return a result of type a or raise an 
exception of type a <: exn. Subtyping on function types and 
continuation types is specified by the following rules: 

T2 <: ti a\ <: oi <j'\ <: 02 02 <■ <T\ 

t 1 — > CT i cti <: r 2 ^ CT 2 a2 <: -i<7 2 

Exception effects enable effect-specific transformations (Benton 
and Buchlovsky 2007). Suppose that the type of / is r ^ El ^ a. 
Then we can apply a 'dead-handler' rewrite on the following: 

letcont h:->{Ei, E2} x — (case x of Ei.ki \\ Ei-ki) in / k hy 
— > letcont h:^{Ei} x = (case x of E\.k\) in / k h y 

In fact, there is nothing exception-specific about this rewrite: it is 
just employing refined types for constructed values. The use of 
continuations has given us exception effects 'for free'. 

4. Implementing CPS 

Many compilers for functional languages represent intermediate 
language terms in a functional style, as instances of an algebraic 
datatype of syntax trees, and manipulate them functionally. For ex- 
ample, the language Acps can be implemented by an SML datatype, 
here using integers for variables, with all bound variables distinct: 

type Var = int and CVar = int 
datatype CVal = 

Unit I Pair of Var * Var I Inj of int * Var 

I Lam of CVar * Var * CTm 
and CTm = 

LetVal of Var * CVal * CTm 

I LetProj of Var * int * Var * CTm 

I LetCont of CVar * Var * CTm * CTm 

I AppCont of CVar * Var 

I App of Var * CVar * Var 

I Case of Var * CVar * CVar 

Rewrites such as those of Figure 5 are then implemented by a 
function that maps terms to terms, applying as many rewrites as 
possible in a single pass. Here is a typical fragment that applies the 
P-PA1R and Dead-Val reductions: 

fun simp census env S K = 
case K of 

LetVal (x, V, L) => 

if count ( census, x) = 0 (* Dead-Val *) 

then simp census env S L 

else LetVal (x, simpVal census env S V, 

simp census (addEnv(env,x,V)) S L) 

I LetProj (x, 1, y, L) => 

let val y' = applySubst S y 
in case lookup (env, y') of 
(* Beta-Pair *) 
Pair(z,_) => 

simp census env (extendSubst S (x,z)) L 
I _ => 
LetProj (x, 1, y' , simp census env S L) 

end 

In addition to the term K itself, the simplifier function simp 
takes a parameter env that tracks letval bindings, a parameter S 
used to substitute variables for variables and a parameter census 
that maps each variable to the number of occurrences of the vari- 
able, computed prior to applying the function. 



The census becomes out-of-date as reductions are applied, and 
this may cause reductions to be missed until the census is recalcu- 
lated and simp applied again. For example, the /3-PAIR reduction 
may trigger a DEAD- VAL in an enclosing letval binding (consider 
letval x — (2/1,1/2) in . . . let z = 7ti x in . . . where x occurs only 
once). Maintaining accurate census information as rewrites are per- 
formed can increase the number of reductions performed in a single 
pass (Appel and Jim 1997), but even with up-to-date census infor- 
mation, it is not possible to perform shrinking reductions exhaus- 
tively in a single pass, so a number of iterations may be required be- 
fore all redexes have been eliminated. In the worst case, this leads 
to 0(n 2 ) behaviour. 

What's more, each pass essentially copies the entire term, leav- 
ing the original term to be picked up by the garbage collector. This 
can be expensive. (Nonetheless, the simplicity of our CPS lan- 
guage, with substitutions only of variables for variables, and the 
lack of commuting conversions as are required in ANF or monadic 
languages, leads to a very straightforward simplifier algorithm.) 

4.1 Graphical representation of terms 

An alternative is to represent the term using a graph, and to perform 
rewrites by destructive update of the graph. Appel and Jim (1997) 
devised a representation for which exhaustive application of the 
shrinking /3-reductions of Figure 5 takes time linear in the size of 
the term. We improve on their representation to support efficient r/- 
reductions and other transformations. The representation has three 
ingredients. 

1. The term structure itself is a doubly-linked tree. Every subterm 
has an up-link to its immediately enclosing term. This supports 
constant time replacement, deletion, and insertion of subterms. 

2. Each bound variable contains a link to one of its free occur- 
rences, or is null if the variable is dead, and the free occurrences 
themselves are connected together in a doubly-linked circular 
list. This permits the following operations to be performed in 
constant time: 

• Determining whether a bound variable has zero, one, or 
more than one occurrence, and if it has only one occurrence, 
locating that occurrence. 

• Determining whether a free variable is unique. 

• Merging two occurrence lists. 

Furthermore, we separate recursive and non-recursive uses of 
variables; in essence, instead of letfun / k hx = K in Lwe 
write let / = rec g k h x. K [g/f] in L. This lets us detect 
DEAD-* and /3-*-LlN reductions. 

3. Free occurrences are partitioned into same-binder equivalence 
classes by using the union-find data structure (Cormen et al. 
2001) 2 . The representative in each equivalence class (that is, the 
root of the union-find tree) is linked to its binding occurrence. 

This supports amortized near-constant time access to the binder 
(the find operation) and merging of occurrence lists (the union 
operation). 

Substitution of variable x for variable y is implemented in near- 
constant time by (a) merging the circular lists of occurrences so 
that x now points to the merged list, and (b) applying a union 
operation so that the occurrences of y are now associated with the 
binder for x. 

Consider the following value term, with doubly-linked tree 
structure and union-find structure implicit but with binder-to-free 



2 Readers familiar with type inference may recall that union-find underpins 
the almost-linear time algorithm for term unification (Baader and Nipkow 
1998). 



pointer shown as a dotted arrow and circular occurrence lists shown 
as solid arrows: 

A k x .. . 



let p = ( '^z^), y ) 




Now suppose that we wish to apply /3-PAIR to the projection mp. 
Using the find operation on the union-find structure we can locate 
the pair (x, y) in near constant time. Now we substitute x for z by 
disconnecting z's binder from its circular list and connecting x's 
occurrence list in its place, and merging the two lists, in constant 
time. At the same time, we apply the union operation to merge the 
binder equivalence classes (not shown). 

A k x ... 

let p = ( '^x , y ) 



in 




Finally we remove the projection itself, deleting the occurrence of p 
from the circular list, again in constant time: 



A k x ... 




One issue remains: the classical union-find data structure does not 
support deletion. There are recent techniques that extend union-find 
with amortized near-constant time deletion (Kaplan et al. 2002). 
However, the representation is non-trivial, and might add unaccept- 
able overhead to the union and find operations, so we chose instead 
a simpler solution: do nothing! Deleted occurrences remain in the 
union-find data structure, possibly as root nodes, or as nodes on the 
path to the root. In theory, the efficiency of rewriting is then depen- 
dent on the 'peak' size of the term, not its current size, but we have 
not found this to be a problem in practice. 

Each of the shrinking reductions of Figure 5 can be imple- 
mented in almost-constant time using our graph representation. To 
put these together and apply them exhaustively on a term, we fol- 
low Appel and Jim (1997): 

• First sweep over the term, detecting redexes and collecting them 
in a worklist. 

• Then pull items off the worklist one at a time (in any order), 
applying the appropriate rewrite, and adding new redexes to 
the worklist that are triggered by the rewrite. For example, 
the removal of a free occurrence (as can happen for multiple 
variables when applying Dead-Val) can induce a DEAD-* 
reduction (if no occurrences remain) or a /3-*-LlN reduction 
(if only a single occurrence remains). 



In the current implementation, the worklist is represented as a 
queue, but it should be possible to thread it through the term itself. 
Shrinking reductions could then be performed with constant space 
overhead. 

4.2 Comparison with Appel/Jim 

The representation of Appel and Jim (1997) did not make use of 
union-find to locate binders. Instead, (a) the circular list of variable 
occurrences included the bound occurrence, thus giving constant 
time access to the binder in the case that the free variable is unique, 
and (b) for letval-bound variables, each free occurrence contained 
an additional pointer to its binder. When performing a substitution 
operation, these binder links must be updated, using time linear in 
the number of occurrences; fortunately, for any particular variable 
this can happen only once during shrinking reductions, as letval- 
bound variables cannot become rebound. Thus the cost is amortized 
across the shrinking reductions. 

Unfortunately the lack of binder occurrences for non-letval- 
bound variables renders less efficient other optimizations such as 
^-reduction. Take an instance of j)-PAIR: 

let x\ — 7Ti x in C[let X2 = iV2 x in C'[letval y = (xi, X2) in K]] 
— > let xi — 7ri x in C[let X2 = ^2 x in C' [_K"[:r /«/]]] 

Just to locate the binder for xi and X2 would take time linear in the 
number of occurrences. 

Our use of union-find gives us efficient implementation of all 
shrinking reductions, and of other transformations too; moreover, 
when analysing efficiency we need not be concerned whether vari- 
ables are letval-bound or not. 

4.3 Performance results 

We have modified the SML.NET compiler to make use of a typed 
CPS intermediate language only mildly more complex than that 
shown in Figure 7. It employs the graphical representation of terms 
described above; in particular, the simplifier performs shrinking 
reductions exhaustively on a term representing the whole program, 
and it is invoked a total of 15 times during compilation. 

Table 1 presents some preliminary benchmark results show- 
ing average time spent in simplification, time spent in monomor- 
phisation, and time spent in unit-removal (e.g. transformation of 
unit*int values to int). We compare (a) the released version of 
SML.NET, implementing a monadic intermediate language (MIL) 
and functional-style simplification algorithm, (b) the Appel/Jim- 
style graph representation adapted to MIL terms implemented by 
Lindley (Benton et al. 2004a; Lindley 2005), and (c) the new graph- 
based CPS representation with union-find. Tests were run on a 
3Ghz Pentium 4 PC with 1GB of RAM running Windows Vista. 
The SML.NET compiler is implemented in Standard ML and com- 
piled using the MLton optimizing compiler, which generates high 
quality code from both functional and imperative coding styles - so 
giving both techniques a fair shot. 

As can be seen from the figures, the graph-based simplifier for 
the monadic language is significantly faster than the functional sim- 
plifier - and although all times are small, bear in mind that the 
simplifier is run many times during compilation. Unit removal is 
roughly comparable in performance across implementations. Inter- 
estingly, the graph-based CPS implementation of monomorphisa- 
tion runs up to twice as slowly as the functional monadic imple- 
mentation. We conjecture that this is because monomorphisation 
necessarily copies (and specializes) terms, and CPS terms tend to 
be larger than MIL terms, and the graph representation is larger 
still. 

These figures come with a caveat: the comparison is somewhat 
"apples and oranges". There are differences between the MIL, g- 
MIL and g-CPS representations that are unrelated to monads or 



Table 1. Optimization times (in seconds) 



Benchmark 


Lines 


Phase 


MIL 


g-MIL 


g-CPS 


raytrace 


2,500 


Simp 


0.12 


0.01 


0.01 


mlyacc 


6,200 


Simp 


0.44 


0.02 


0.02 


smlnet 


80,000 


Simp 


7.29 


0.29 


0.15 






Mono 


0.75 


n/a 


1.41 






Deunit 


0.76 


1.3 


0.6 


hamlet 


20,000 


Simp 


0.97 


0.08 


0.04 






Mono 


0.15 


n/a 


0.19 






Deunit 


0.12 


0.16 


0.14 



CPS. Future work is to make a fairer comparison, implementing 
a functional version of the CPS terms, and perhaps also a precise 
monadic analogue. 

5. Contification 

Our CPS languages make a syntactic distinction between functions 
and local continuations. The former are typically compiled as heap- 
allocated closures or as known functions, whilst the latter can al- 
ways be compiled as inline code with continuation applications 
compiled as jumps. For efficiency it is therefore desirable to trans- 
form functions into continuations, a process that has been termed 
contification (Fluet and Weeks 2001). 

Functions can be contified when they always return to the same 
place. Consider the following code written in the subset of SML 
studied in Section 2: 

let fun f x = ... 

in g (case d of inl dl => f y I in2 d2 => f d2) end 

If f returns at all, it must pass control to g. Here, this is obvious, 
but for more complex examples it is not so apparent. Now consider 
its CPS transform: 

letval / = (Afc x.- ■ ■ k ■ ■ •) in 
letcont ko w — g r w in 
letcont ji di = / fco V m 
letcont ji di — f ko di in 
case d of j l | j 2 

It is clear that / is always passed the same continuation fco - and 
so, unless it diverges, it must return through fco and so pass control 
to g. We can transform / into a local continuation, as follows: 

letcont fco w = g r w in 
letcont j x — ■ ■ ■ fc 0 ■ ■ ■ in 
letcont ji di = j y in 
letcont ji di = j di in 
case d of ji \ ji 

We have done three things: (a) we have replaced the function / by 
a continuation j, deleting the return continuation at both definition 
and call sites, (b) we have substituted the argument fco for the 
formal fc in the body of /, and (c) we have moved j so that it is 
in the scope of fco. 

Fluet and Weeks (2001) use the dominator tree of a program's 
call graph to contify programs that consist of a collection of 
mutually-recursive first-order functions. They show that their al- 
gorithm is optimal: no contifiable functions remain after applying 
the transformation. Their dominator-based analysis can be adapted 
to our CPS languages, and is simpler to describe in this context be- 
cause all function definitions and uses have a named continuation 
(Fluet and Weeks use named continuations only for non-tail calls). 
When applied to top-level functions, the transformation is simpler 
too, but in the presence of first-class functions and general block 
structure the transformation becomes significantly more complex 
to describe. 



We prefer an approach based on incremental transformation, in 
essence repeatedly applying the rewrite illustrated above until no 
further rewrites are possible. We consider first the case of non- 
recursive functions, then generalize to mutually-recursive func- 
tions, and conclude by relating our technique to dominator-based 
contification. 

5.1 Non-recursive functions 

In the untyped language A^ps without recursion, it is particularly 
straightforward to spot contifiable functions: they are those for 
which all occurrences are applications with the same continuation 
argument. We define the following rewrite: 

CONT (/ not free in C, V and V minimal): 

letval / = Xkx.K in C[D[f ko xi, . . . ,/ fco x n ]] 



C[letcont j x = K [fco/fc] in T>[j xi 



Here C is a single-hole context as presented in Figure 5 and D is a 
multi-hole context whose formalization we omit. 

The CONT rewrite combines three actions: (a) the function / 
is replaced by a continuation j, with each application replaced 
by a continuation application; (b) the common continuation fco is 
substituted for the formal continuation parameter fc in the body K 
of /; and (c) the new continuation j is pulled into the scope 
of the continuation fco. The multi-hole context T> is the smallest 
context enclosing all uses of /, which ensures that j is in scope 
after transformation. The analysis is trivial (just check call sites for 
common continuation arguments), yet iterating this transformation 
leads to optimal contification, in the sense of Fluet and Weeks 
(2001). Here is an example adapted from loc. cit. §5.2, 



letval h — Xkh Xh-- 
letval gi = Afci xi.- 
letval gi = Xki xi.- 
letval / = Xkf Xf .■ 
letval m = Afc m x m 



■ in 

• h fci z\ ■ 

■ h ki zi ■ 

■ gi kf z 3 ■ 

■■fjl 26 



• fcl 2 8 ' ' ' i 

■ in 

■ • gi kf 24 • 

■ • / ji 2 7 i 



■ gi kf 25 • • • in 



We can immediately see that gi and gi (but not K) are always 
passed the same continuation fc/, and so we can apply CONT to 
contify them both: 

letval h = Xkh Xh-- • • in 
letval / = Xkf Xf. 

(letcont kgi xi = ■■■ h kf zi ■■■ kf zs ■■■ in 

letcont kgi xi = ■ ■ ■ h kf zi ■ ■ ■ in 

• • • kgi 23 • • • kgi 24 • • • kgi z 5 ■ ■ ■) in 
letval Xmk m .x m - ■ ■ f ji z 6 ■ ■ ■ f ji z 7 = in . . . 

Now h can be contified as it is always passed kf. 



letval / = Xkf Xf. 
(letcont kh Xh = • • ■ 
letcont kgi xi = ■ ■ ■ 
letcont kgi xi = ■ ■ ■ 
■ ■ ■ kgi 23 • • • kgi 24 



in 

kh zi ■ ■ ■ kf 28 i 
kh zi ■ ■ ■ in 
• • kgi 2 B • • ■) in 



n 



letval Xmk m .x m - ■ ■ f ji z 6 ■ ■ ■ f ji zr = in . . . 
5.2 Recursive functions 

Generalizing to recursive functions and continuations is a little 
trickier. Suppose we have a A^ps term of the form 

letfun fi fci hi xi = Ki 

fn fcn h n X n — Kn 

in K. 

A set of functions F C . . . , /„} can be contified collectively, 
written Contifiable(_F), if there is some pair of continuations fco 
and ho such that each occurrence of / 6 F is either a tail call 



within F or is a call with continuation arguments k 0 and h 0 . In- 
tuitively, each function (eventually) returns to the same place (fco), 
or throws an exception that is caught by the same handler (ho), 
though control may pass tail-recursively through other functions 
in F. There may be many such subsets F; we assume that F is in 
fact strongly-connected with respect to tail calls contained within it 
(or is a trivial singleton with no tail calls). Then for a given letfun 
term there is a unique partial partition of the functions into disjoint 
subsets satisfying Contifiable(— ). 

Let F = {/i, . . . , f m }. Define a translation on function appli- 
cations 

, fkh x y = [ii x iff = fie F 
I / k h x otherwise 

and extend this to all terms. Assuming that Contifiablc(F) holds, 
there are two possibilities. 

1. All applications of the form / ko ho x for / £ F are in the 
term K. Then we can apply the following rewrite, which is the 
direct analogue of CONT. 

RecCont (/i , . . . , f m not free in C, and K minimal): 
letfun /i ki hi xi = K\ 

fn k n h n x n — K71 
in C[K] 

letfun f m+1 k 

' ' ' fn. k n fhji Xn — K71. 

in C[letcont ji xi = K{[ko/ki, ho /hi] 

' ' ' jm %m Km \k() / k m , Hq j ' Hm\ 

in K*] 

2. Otherwise, all applications of the form / ko ho x for / € F 
are in the body of one of the functions outside of F; without 
loss of generality we assume this is /„ . 

RECCONT2 (fi,...,f m not free in C, and K n minimal): 
letfun /1 k\ hi x\ = K\ 

fn—i k n —i h n —i x n —\ = K n —\ 

fn k n h n X n = C[i\n] 

in K 

letfun fm+l fem + l h m +l X m +1 = K m +1 
fn-1 k n -\ h n -l X n -1 = K n -l 
fn k n hn X n — 

C[letcont ji xi = K*[ko/ki,ho/hi] 
' ' ' Jm Xm — K m [ko/km, h 0 /h 

in K 

For an example of the latter, more complex, transformation, 
consider the following SML code: 

let fun unif (Ap(a,xs) , Ap(b,ys) ) = (unif (a,b) ;unif V(xs ,ys) ) 
I unif (Ar(a,b) ,Ar(c,d)) = unifVC [a,b] , [c,d] ) 
and unifV(x: :xs,y : :ys) = (unif (x,y) ; unif V(xs ,ys) ) 
I unif V ([],[]) =0 

in unif end 

The function unif yV can be confined into the definition of unif: it 
tail-calls itself, and its uses inside unif have the same continuation. 

5.3 Comparing dominator-based contiflcation 

The dominator-based approach of Fluet and Weeks (2001) can be 
recast in our CPS language as follows. (For simplicity we do not 
consider exception handler continuations here). First construct a 
continuation flow graph for the whole program. Nodes consist of 
continuation variables and a distinguished root node. Then for each 



function / with return continuation k, if / is passed around as a 
first-class value then create an edge from root to k\ otherwise, for 
each application / j x create an edge from j to k. Finally, for each 
local continuation k create an edge from root to k. 

The non-recursive CONT rewrite has the effect of merging two 
nodes in the graph, as follows: 




The recursive RecCont and RECCONT2 rewrites are similar, 
except that in place of k we have a strongly-connected component 
\ki , . . . , k m }■ 




Conversely, any part of the flow graph matching the left-hand-side 
of this diagram corresponds to a contifiable subset of functions in a 
letfun to which the RecCont or RECCONT2 rules can be applied. 

It is immediately clear that exhaustive rewriting terminates, 
as the flow graph decreases in size with each rewrite, eventually 
producing a graph with no occurrences of the pattern above. 

The algorithm described by Fluet and Weeks (2001) contifies k 
if it is strictly dominated by some continuation j whose immediate 
dominator is root. It can be shown that if a rooted graph contains 
such a pair of nodes j and k, then some part of the graph matches 
the pattern above. Hence exhaustive rewriting has the same effect 
as as optimal contiflcation based on dominator trees. 

6. Related work and conclusion 

The use of continuation-passing style for functional languages has 
its origins in Scheme compilers (Steele 1978; Kranz et al. 1986). 
It later formed the basis of the Standard ML of New Jersey com- 
piler (Appel 1992; Shao and Appel 1995). 

In early compilers, lambdas originating from the CPS transfor- 
mation were not distinguished from lambdas present in the source, 
so some effort was expended at code generation time to determine 
which lambdas could be stack-allocated and which could be heap- 
allocated. Later compilers made a syntactic distinction between 
true functions and 'second-class' continuations introduced by CPS; 
and sometimes transformed one into the other (Kelsey and Hudak 
1989), though contiflcation was not studied formally. 

A number of more recent compilers use what has been called 
almost CPS. The Sequentialized Intermediate Language (SIL) em- 
ployed by Tolmach and Oliva (1998) is a monadic-style language in 
which a letcont-like feature is used to introduce join points. Some- 
what closer to our CPS language is the First Order Language (FOL) 
of the MLton compiler (Fluet and Weeks 2001). It goes further than 
SIL in making use of named local continuations in all branch con- 
structs and non-tail calls. However, functions are not parameterized 
on return (or handler) continuations, and there is special syntax for 
tail calls and returns. This non-uniform treatment of continuations 
complicates transformations - inlining of non-tail functions must 
replace all 'return points' with jumps, and the contiflcation analy- 
sis and transformation must treat tail and non-tail calls differently. 

We have found the uniform treatment of continuations in our 
CPS language to be a real benefit, not only as a simplifying force in 
implementation, but also in thinking about compiler optimizations: 



contification, in particular, is difficult to characterize in the absence 
of a notion of continuation passing. 

As far as we are aware, we are the first to implement linear- 
time shrinking reductions in the style of Appel and Jim (1997). An 
earlier term-graph implementation by Lindley was for a monadic 
language and had worst-case 0(n 2 ) behaviour due to commuting 
conversions (Benton et al. 2004a; Lindley 2005). Shivers and Wand 
(2005) have proposed a rather different graph representation for 
lambda terms, with the goal of sharing subterms after /3-reduction. 
Their representation does bear some resemblance to ours, though, 
with up-links from subterms to enclosing terms, and circular lists 
that connect the sites where a term is substituted for a variable. 

This paper would not be complete without a mention of Static 
Single Assignment form (SSA), the currently fashionable interme- 
diate representation for imperative languages. As is well known, 
SSA is in some sense equivalent to CPS (Kelsey 1995) and to 
ANF (Appel 1998). Its focus is intra-procedural optimization (as 
with ANF, it's necessary to renormalize when inlining functions, 
in contrast to CPS) and there is a large body of work on such op- 
timizations. Future work is to transfer SSA-based optimizations to 
CPS. We conjecture that CPS is a good fit for both functional and 
imperative paradigms. 

Acknowledgments 

I would like to thank Nick Benton, Olivier Danvy, Sam Lindley, 
Simon Peyton Jones and Claudio Russo for fruitful discussions on 
compiler intermediate languages. Georges Gonthier suggested the 
use of union-find in the graphical representation of terms. 

References 

Andrew W. Appel. Compiling with Continuations. Cambridge University 
Press, 1992. 

Andrew W. Appel. SSA is functional programming. SIGPLAN Notices, 33 
(4): 17-20, 1998. 

Andrew W. Appel and Trevor Jim. Shrinking lambda expressions in linear 
time. Journal of Functional Programming, 7(5):5 15-540, 1997. 

Franz Baader and Tobias Nipkow. Term Rewriting and All That. Cambridge 
University Press, 1998. 

Nick Benton and Peter Buchlovsky. Semantics of an effect analysis for 
exceptions. In ACM SIGPLAN International Workshop on Types in 
Language Design and Implementation (TLDI), pages 15-26, 2007. 

Nick Benton and Andrew Kennedy. Exceptional syntax. Journal of Func- 
tional Programming, 11(4):395-410, 2001. 

Nick Benton, Andrew Kennedy, and George Russell. Compiling Standard 
ML to Java bytecodes. In 3rd ACM SIGPLAN International Conference 
on Functional Programming. ACM Press, September 1998. 

Nick Benton, Andrew Kennedy, Sam Lindley, and Claudio Russo. Shrink- 
ing reductions in SML.NET. In 16th International Workshop on Imple- 
mentation and Application of Functional Languages (IFL), 2004a. 

Nick Benton, Andrew Kennedy, and Claudio Russo. Adventures in interop- 
erability: The SML.NET experience. In 6th International Conference on 
Principles and Practice of Declarative Programming (PPDP), 2004b. 

Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein. 
Introduction to Algorithms. MIT Press, second edition, 2001. 

Olivier Danvy. A new one-pass transformation into monadic normal form. 
In 12th International Conference on Compiler Construction (CC'03), 
2003. 

Olivier Danvy and Andrzej Filinski. Representing control: A study of the 
CPS transformation. Mathematical Structures in Computer Science, 2 
(4):361-391, 1992. 

Olivier Danvy and Lasse R. Nielsen. A first-order one-pass CPS transfor- 
mation. Theor. Comput. Sci., 308(l-3):239-257, 2003. 



Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. 
The essence of compiling with continuations (with retrospective). In 
McKinley (2004), pages 502-514. 

Matthew Fluet and Stephen Weeks. Contification using dominators. In 
ICFP'01: Proceedings of the Sixth ACM SIGPLAN International Con- 
ference on Functional Programming, pages 2-13. ACM Press, Septem- 
ber 2001. 

John Hatcliff and Olivier Danvy. A generic account of continuation-passing 
styles. In Principles of Programming Languages (POPL), pages 458- 
471, 1994. 

Haim Kaplan, Nira Shafrir, and Robert E. Tarjan. Union-find with deletions. 
In SODA '02: Proceedings of the thirteenth annual ACM-SIAM sympo- 
sium on Discrete algorithms, pages 19-28, Philadelphia, PA, USA, 2002. 
Society for Industrial and Applied Mathematics. ISBN 0-89871-513-X. 

Richard Kelsey. A correspondence between continuation passing style 
and static single assignment form. In Intermediate Representations 
Workshop, pages 13-23, 1995. 

Richard A. Kelsey and Paul Hudak. Realistic compilation by program 
transformation. In Principles of Programming Languages (POPL). 
ACM, January 1989. 

Jung-taek Kim, Kwangkeun Yi, and Olivier Danvy. Assessing the overhead 
of ML exceptions by selective CPS transformation. In ACM SIGPLAN 
Workshop on ML, pages 1 12-1 19, 1998. Also appears as BRICS techni- 
cal report RS-98-15. 

David A. Kranz, Richard A. Kelsey, Jonathan A. Rees, Paul Hudak, and 
James Philbin. ORBIT: an optimizing compiler for scheme. In Proceed- 
ings of the ACM SIGPLAN symposium on Compiler Construction, pages 
219-233, June 1986. 

Sam Lindley. Normalisation by evaluation in the compilation of typed func- 
tional programming languages. PhD thesis, University of Edinburgh, 
2005. 

Kathryn S. McKinley, editor. 20 Years of the ACM SIGPLAN Conference 
on Programming Language Design and Implementation 1979-1999, A 
Selection, 2004. ACM. 

Eugenio Moggi. Notions of computation and monads. Information and 
Computation, 93:55-92, 1991. 

A. M. Pitts. Typed operational reasoning. In B. C. Pierce, editor, Advanced 
Topics in Types and Programming Languages, chapter 7, pages 245-289. 
The MIT Press, 2005. 

Amr Sabry and Philip Wadler. A reflection on call-by-value. ACM Trans- 
actions on Programming Languages and Systems (TOPIAS), 19(6):916- 
941, November 1997. ISSN 0164-0925. 

Zhong Shao and Andrew W. Appel. A type-based compiler for Standard 
ML. In Proc. 1995 ACM SIGPLAN Conference on Programming Lan- 
guage Design and Implementation (PLDI), pages 116-129, La Jolla, CA, 
Jun 1995. 

Olin Shivers. Higher-order control-flow analysis in retrospect: lessons 
learned, lessons abandoned (with retrospective). In McKinley (2004), 
pages 257-269. 

Olin Shivers and Mitchell Wand. Bottom-up /3-reduction: Uplinks and A- 
DAGs. In European Symposium on Programming (ESOP), pages 217- 
232, 2005. 

Guy L. Steele. RABBIT: A compiler for SCHEME. Technical Report AI- 
TR-474, MIT, May 1978. 

Hayo Thielecke. Comparing control constructs by double-barrelled CPS. 
Higher-Order and Symbolic Computation, 15(2/3):141-160, 2002. 

Andrew P. Tolmach and Dino Oliva. From ML to Ada: Strongly-typed 
language interoperability via source translation. Journal of Functional 
Programming, 8(4):367-412, 1998. 

Philip Wadler and Peter Thiemann. The marriage of effects and monads. In 
ACM SIGPLAN International Conference on Functional Programming 
(ICFP), 1998. 



